Team Member :
Dr Omar bin Nazmi
Dr Ahmad Syahid bin Ibrahim
Dr Mohd Hilmi bin Mat Husain
Survival analysis provides estimation of the association between risk factors and time-to-evet and make prediction of subject’s survival probabilities. Cox propotional hazard model or cox refression is a semiparametric model used to investigare the association between outcomes and predictors factors. unlike parametric model,in Cox PH , it holds assumption of :
This exercise will provide practical on Semiparametric model for Survival Analysis. The analysis will be using data : stroke_fatality.dta The event variable is status( dead vs censored)
Load Package
library(haven)
## Warning: package 'haven' was built under R version 4.4.2
library(broom)
## Warning: package 'broom' was built under R version 4.4.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gtsummary)
## Warning: package 'gtsummary' was built under R version 4.4.2
library(ggplot2)
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.2
## corrplot 0.95 loaded
Load package for survival Analysis
library(survival)
library(survminer)
## Warning: package 'survminer' was built under R version 4.4.2
## Loading required package: ggpubr
## Warning: package 'ggpubr' was built under R version 4.4.2
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
library(haven)
stroke_fatalityssss <- read_dta("stroke_fatality.dta")
glimpse(stroke_fatalityssss)
## Rows: 226
## Columns: 49
## $ sex <dbl+lbl> 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1,…
## $ race <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ doa <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2 <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2 <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ gcs <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm <dbl+lbl> 1, NA, 1, 8, NA, 2, 1, 1, 2, 2, 2, 2, NA, N…
## $ hpt <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 1, …
## $ ckd <dbl+lbl> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, N…
## $ af <dbl+lbl> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, N…
## $ hd <dbl+lbl> NA, NA, 2, 8, NA, 2, 8, 1, 2, 2, 2, 2, NA, …
## $ dyslipid <dbl+lbl> NA, NA, 1, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, N…
## $ tia <dbl+lbl> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, N…
## $ smoker <dbl+lbl> NA, NA, 2, 8, NA, 2, 8, 2, 2, 1, 2, 2, NA, N…
## $ hb <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2 <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2 <dbl+lbl> 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1,…
## $ marital2 <dbl+lbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 4, 4, 4, 3, 1,…
## $ status1b <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ status2b <dbl+lbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ status3b <dbl+lbl> 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,…
## $ hpt2 <dbl+lbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, …
## $ marital3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2 <dbl+lbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ sex3 <dbl+lbl> 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1,…
## $ referral2 <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <dbl+lbl> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,…
## $ icd10cat <dbl+lbl> 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,…
## $ icd10cat2 <dbl+lbl> 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,…
## $ icd10cat3 <dbl+lbl> 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 1,…
## $ dm2cat <dbl> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
stroke.fatality <- stroke_fatalityssss %>%
mutate_if(is.labelled, ~as_factor(.))
glimpse(stroke.fatality)
## Rows: 226
## Columns: 49
## $ sex <fct> female, male, female, female, male, female, female, femal…
## $ race <fct> malay, malay, malay, malay, malay, chinese, malay, malay,…
## $ doa <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2 <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2 <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ gcs <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm <fct> yes, NA, yes, 8, NA, 2, yes, yes, 2, 2, 2, 2, NA, NA, yes…
## $ hpt <fct> yes, yes, yes, yes, yes, 2, yes, yes, yes, 2, yes, 2, yes…
## $ ckd <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ af <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ hd <fct> NA, NA, 2, 8, NA, 2, 8, yes, 2, 2, 2, 2, NA, yes, yes, NA…
## $ dyslipid <fct> NA, NA, yes, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, N…
## $ tia <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ smoker <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, yes, 2, 2, NA, NA, 8, NA, N…
## $ hb <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2 <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2 <fct> female, male, female, female, male, female, female, femal…
## $ marital2 <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, yes, 4, 4, 4, 3, yes,…
## $ status1b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status2b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status3b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ hpt2 <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, no, yes, no, …
## $ marital3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2 <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes…
## $ sex3 <fct> female, male, female, female, male, female, female, femal…
## $ referral2 <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <fct> GP/Home/Missing, GP/Home/Missing, GP/Home/Missing, hospit…
## $ icd10cat <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat2 <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat3 <fct> "CI,Others", "CI,Others", "ICB, Other Haemorrhage", "ICB,…
## $ dm2cat <dbl> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Change format to factor ( for variable dm2cat, hpt2cat, dyslipid2cat)
library(dplyr)
stroke.fatality1 <- stroke.fatality %>%
mutate(
dm2cat = as_factor(dm2cat),
hpt2cat = as_factor(hpt2cat),
dyslipid2cat = as_factor(dyslipid2cat)
)
str(stroke.fatality1$dm2cat)
## Factor w/ 2 levels "0","1": 2 1 2 1 1 1 2 2 1 1 ...
levels(stroke.fatality1$dm2cat)
## [1] "0" "1"
class(stroke.fatality1$dm2cat)
## [1] "factor"
stroke.fatality1 %>%
tbl_summary()
| Characteristic | N = 2261 |
|---|---|
| sex | |
| male | 97 (43%) |
| female | 129 (57%) |
| race | |
| malay | 211 (93%) |
| chinese | 11 (4.9%) |
| indians | 2 (0.9%) |
| others | 2 (0.9%) |
| not recorded | 0 (0%) |
| missing | 0 (0%) |
| date of admission | 2011-10-29 (2011-06-10, 2012-03-12) |
| date of discharge | 2011-11-02 (2011-06-18, 2012-03-16) |
| time | 4 (2, 7) |
| time2 | 4 (2, 7) |
| status at discharge | |
| alive | 173 (77%) |
| dead | 53 (23%) |
| earliest Glasgow Coma Scale | 15.0 (10.0, 15.0) |
| Unknown | 1 |
| earliest systolic BP (mmHg) | 161 (143, 186) |
| Unknown | 1 |
| earliest diastolic BP (mmHg) | 91 (79, 104) |
| Unknown | 1 |
| heart rate | 80 (68, 95) |
| Unknown | 2 |
| diabetes | |
| no | 9 (5.8%) |
| yes | 74 (48%) |
| 2 | 44 (28%) |
| 8 | 28 (18%) |
| Unknown | 71 |
| hypertension | |
| no | 2 (1.0%) |
| yes | 157 (75%) |
| 2 | 37 (18%) |
| 8 | 13 (6.2%) |
| Unknown | 17 |
| chronic kidney disease | |
| no | 11 (10%) |
| yes | 10 (9.2%) |
| 2 | 44 (40%) |
| 8 | 44 (40%) |
| Unknown | 117 |
| atrial fibrillation | |
| no | 10 (9.7%) |
| yes | 5 (4.9%) |
| 2 | 44 (43%) |
| 8 | 44 (43%) |
| Unknown | 123 |
| heart ds | |
| no | 11 (8.7%) |
| yes | 30 (24%) |
| 2 | 46 (37%) |
| 8 | 39 (31%) |
| Unknown | 100 |
| dyslipidemia | |
| no | 10 (8.1%) |
| yes | 27 (22%) |
| 2 | 43 (35%) |
| 8 | 43 (35%) |
| Unknown | 103 |
| transient ischaemic attack | |
| no | 11 (11%) |
| yes | 0 (0%) |
| 2 | 44 (44%) |
| 8 | 46 (46%) |
| Unknown | 125 |
| smoking | |
| no | 11 (9.9%) |
| yes | 22 (20%) |
| 2 | 44 (40%) |
| 8 | 34 (31%) |
| Unknown | 115 |
| hemoglobin | 13.40 (11.60, 14.65) |
| Unknown | 6 |
| platelet | 235 (186, 277) |
| Unknown | 6 |
| total white cell count | 9.9 (7.5, 12.7) |
| Unknown | 6 |
| natrium | 138.0 (136.0, 140.0) |
| Unknown | 1 |
| potassium | 3.80 (3.50, 4.20) |
| Unknown | 1 |
| glucose | 5.8 (4.9, 8.8) |
| Unknown | 200 |
| capillary blood sugar | 8.1 (6.4, 12.1) |
| Unknown | 17 |
| fasting tot chol(mmol/l) | 6.05 (4.95, 6.80) |
| Unknown | 186 |
| fasting TG (mmol/l) | 1.30 (0.90, 1.90) |
| Unknown | 187 |
| urea | 6.5 (4.8, 9.0) |
| Unknown | 1 |
| referral | |
| 1 | 37 (17%) |
| 2 | 6 (2.8%) |
| 3 | 50 (23%) |
| 4 | 123 (57%) |
| Unknown | 10 |
| transport | |
| 1 | 97 (45%) |
| 2 | 119 (55%) |
| Unknown | 10 |
| age in years | 61 (52, 69) |
| sex | |
| male | 97 (43%) |
| female | 129 (57%) |
| marital status | |
| no | 0 (0%) |
| yes | 10 (4.5%) |
| 2 | 1 (0.4%) |
| 3 | 10 (4.5%) |
| 4 | 202 (91%) |
| Unknown | 3 |
| status (alive, dead, etc) | |
| alive | 173 (77%) |
| dead | 53 (23%) |
| others | 0 (0%) |
| absconded | 0 (0%) |
| status2b | |
| alive | 173 (77%) |
| dead | 53 (23%) |
| status @discharge 1=dead, 0=alive | |
| alive | 173 (77%) |
| dead | 53 (23%) |
| hpt2 | 148 (75%) |
| Unknown | 28 |
| marital3 | 205 (91%) |
| race2 | 211 (93%) |
| male=1, female=0 | |
| female | 129 (57%) |
| male | 97 (43%) |
| referral2 | |
| 0 | 37 (17%) |
| 1 | 6 (2.8%) |
| 2 | 50 (23%) |
| 3 | 123 (57%) |
| Unknown | 10 |
| 0 is hospital, 1=GP or home or missing | |
| hospital | 87 (38%) |
| GP/Home/Missing | 139 (62%) |
| 0=Cerebral Isch , 1=Haemorrhagic, 2= Other | |
| CI,Others | 145 (64%) |
| Haemorrhagic | 77 (34%) |
| Others | 4 (1.8%) |
| 0=Cerebral Isch and others, 1=Haemorrhagic | |
| CI,Others | 149 (66%) |
| Haemorrhagic | 77 (34%) |
| 0=Cerebral Isch or others, 1=SAH, 2=Haemorrhagic | |
| CI,Others | 149 (66%) |
| SAH | 19 (8.4%) |
| ICB, Other Haemorrhage | 58 (26%) |
| dm2cat | |
| 0 | 152 (67%) |
| 1 | 74 (33%) |
| hpt2cat | |
| 0 | 78 (35%) |
| 1 | 148 (65%) |
| dyslipid2cat | |
| 0 | 199 (88%) |
| 1 | 27 (12%) |
| 1 n (%); Median (Q1, Q3) | |
glimpse(stroke.fatality1)
## Rows: 226
## Columns: 49
## $ sex <fct> female, male, female, female, male, female, female, femal…
## $ race <fct> malay, malay, malay, malay, malay, chinese, malay, malay,…
## $ doa <date> 2011-01-01, 2011-01-03, 2011-01-06, 2011-01-16, 2011-01-…
## $ dod <date> 2011-01-05, 2011-01-06, 2011-01-22, 2011-02-08, 2011-01-…
## $ time <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ time2 <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status2 <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ gcs <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sbp <dbl> 150, 152, 231, 110, 199, 190, 145, 161, 222, 161, 149, 15…
## $ dbp <dbl> 87, 108, 117, 79, 134, 101, 102, 96, 129, 107, 90, 61, 95…
## $ hr <dbl> 92, 87, 64, 90, 72, 63, 102, 81, 72, 94, 59, 81, 61, 120,…
## $ dm <fct> yes, NA, yes, 8, NA, 2, yes, yes, 2, 2, 2, 2, NA, NA, yes…
## $ hpt <fct> yes, yes, yes, yes, yes, 2, yes, yes, yes, 2, yes, 2, yes…
## $ ckd <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ af <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ hd <fct> NA, NA, 2, 8, NA, 2, 8, yes, 2, 2, 2, 2, NA, yes, yes, NA…
## $ dyslipid <fct> NA, NA, yes, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, N…
## $ tia <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, 2, 2, 2, NA, NA, 8, NA, NA,…
## $ smoker <fct> NA, NA, 2, 8, NA, 2, 8, 2, 2, yes, 2, 2, NA, NA, 8, NA, N…
## $ hb <dbl> 10.4, 13.0, 11.0, 14.3, 15.7, 11.7, 13.4, 11.8, 12.6, 16.…
## $ plat <dbl> 249, 156, 179, 233, 351, 133, 290, 251, 196, 188, 139, 30…
## $ wbc <dbl> 12.5, 7.4, 22.4, 9.6, 18.7, 11.3, 15.8, 8.5, 9.0, 9.5, 11…
## $ na <dbl> 138, 132, 135, 132, 138, 140, 134, 135, 129, 137, 141, 13…
## $ potas <dbl> 3.6, 4.1, 4.7, 3.8, 3.8, 3.0, 4.1, 4.0, 4.0, 3.7, 4.1, 4.…
## $ gluc <dbl> NA, 5.1, NA, NA, NA, NA, NA, NA, NA, 5.5, NA, NA, NA, NA,…
## $ cbs <dbl> 11.4, NA, 18.6, 6.8, 6.5, 8.4, 13.4, 14.9, 6.6, 5.8, 7.7,…
## $ chol <dbl> NA, 4.7, NA, NA, NA, NA, NA, NA, NA, 6.9, NA, NA, NA, NA,…
## $ tg <dbl> NA, 1.1, NA, NA, NA, NA, NA, NA, NA, 1.0, NA, NA, NA, NA,…
## $ urea <dbl> 5.1, 8.4, 11.0, 7.4, 3.8, 4.0, 3.3, 5.6, 5.8, 5.8, 7.3, 7…
## $ referral <dbl> 4, 4, 4, 3, 1, 2, 4, 3, 3, 4, 1, 4, 1, 3, 4, 3, 3, 1, 4, …
## $ transport <dbl> 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, …
## $ age2 <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ sex2 <fct> female, male, female, female, male, female, female, femal…
## $ marital2 <fct> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, yes, 4, 4, 4, 3, yes,…
## $ status1b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status2b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ status3b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ hpt2 <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, no, yes, no, …
## $ marital3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, …
## $ race2 <fct> yes, yes, yes, yes, yes, no, yes, yes, yes, yes, yes, yes…
## $ sex3 <fct> female, male, female, female, male, female, female, femal…
## $ referral2 <dbl> 3, 3, 3, 2, 0, 1, 3, 2, 2, 3, 0, 3, 0, 2, 3, 2, 2, 0, 3, …
## $ referral2cat <fct> GP/Home/Missing, GP/Home/Missing, GP/Home/Missing, hospit…
## $ icd10cat <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat2 <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ icd10cat3 <fct> "CI,Others", "CI,Others", "ICB, Other Haemorrhage", "ICB,…
## $ dm2cat <fct> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ hpt2cat <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <fct> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Variable selection
we will select importance variables for analysis. A multi-centered prospective cohort study has reported that determinants for stroke fatality involve patient-level and system-level determinants (Sarfo, Fred S et al. 2023).
selected variable :
1. important variables for stroke fatality (Sarfo, Fred S et al. 2023 )
2. Complete dataset
variable selected : 1. time : duration (days) = DOD-DOA 2. status3b : status of the patient at discharge : dead or alive 3. age2 (numerical): age in years 4. gcs (numerical ) : gcs score 5. sex (categorical): Male, female 6.icd10cat : category if diagnosis ICD10 : inschaemia(CI) and others, haemorhagic 7. dm2cat : DM status ( categorical, 0 = No, 1 = Yes) 8. hpt2cat : Hypertension status ( categorical, 0 = No, 1 = Yes ) 9. dyslipid2cat : Dsylipidemia status ( categorical, 0 = No, 1 = Yes )
Outcome event of interest( status ) : Dead
stroke1 <- stroke.fatality1 %>%
dplyr::select(time, status3b, age2, gcs, sex, dm2cat, icd10cat2, hpt2cat, dyslipid2cat)
glimpse(stroke1)
## Rows: 226
## Columns: 9
## $ time <dbl> 4, 3, 16, 23, 5, 4, 22, 14, 4, 2, 22, 2, 43, 2, 4, 4, 3, …
## $ status3b <fct> dead, alive, alive, alive, alive, dead, alive, dead, aliv…
## $ age2 <dbl> 69, 64, 63, 67, 66, 78, 55, 65, 67, 56, 50, 78, 59, 83, 8…
## $ gcs <dbl> 15, 15, 15, 11, 15, 7, 5, 13, 15, 15, 10, 15, 14, 9, 15, …
## $ sex <fct> female, male, female, female, male, female, female, femal…
## $ dm2cat <fct> 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, …
## $ icd10cat2 <fct> "CI,Others", "CI,Others", "Haemorrhagic", "Haemorrhagic",…
## $ hpt2cat <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, …
## $ dyslipid2cat <fct> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
summary(stroke1)
## time status3b age2 gcs sex
## Min. : 0.000 alive:173 Min. :20.00 Min. : 3.00 male : 97
## 1st Qu.: 2.250 dead : 53 1st Qu.:52.00 1st Qu.:10.00 female:129
## Median : 4.000 Median :61.00 Median :15.00
## Mean : 6.469 Mean :60.82 Mean :12.44
## 3rd Qu.: 7.000 3rd Qu.:68.75 3rd Qu.:15.00
## Max. :61.000 Max. :94.00 Max. :15.00
## NA's :1
## dm2cat icd10cat2 hpt2cat dyslipid2cat
## 0:152 CI,Others :149 0: 78 0:199
## 1: 74 Haemorrhagic: 77 1:148 1: 27
##
##
##
##
##
Missing data in view of missing a single value of data in covariate gcs, we will do imputation using single imputation method.
stroke1$gcs[is.na(stroke1$gcs)] <- mean(stroke1$gcs, na.rm = TRUE)
summary(stroke1)
## time status3b age2 gcs sex
## Min. : 0.000 alive:173 Min. :20.00 Min. : 3.00 male : 97
## 1st Qu.: 2.250 dead : 53 1st Qu.:52.00 1st Qu.:10.00 female:129
## Median : 4.000 Median :61.00 Median :15.00
## Mean : 6.469 Mean :60.82 Mean :12.44
## 3rd Qu.: 7.000 3rd Qu.:68.75 3rd Qu.:15.00
## Max. :61.000 Max. :94.00 Max. :15.00
## dm2cat icd10cat2 hpt2cat dyslipid2cat
## 0:152 CI,Others :149 0: 78 0:199
## 1: 74 Haemorrhagic: 77 1:148 1: 27
##
##
##
##
tbl_summary(stroke1,
by = status3b,
statistic = list(all_continuous() ~ "{mean} ({sd})", all_categorical() ~ "{n} ({p}%)")
) %>%
modify_header(label = "**Variable**") %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Status**") %>%
modify_caption("Summary of Data by Status")
| Variable |
Status
|
|
|---|---|---|
| alive N = 1731 |
dead N = 531 |
|
| time | 6 (7) | 8 (9) |
| age in years | 60 (14) | 62 (15) |
| earliest Glasgow Coma Scale | 13.7 (2.7) | 8.5 (4.0) |
| sex | ||
| male | 82 (47%) | 15 (28%) |
| female | 91 (53%) | 38 (72%) |
| dm2cat | ||
| 0 | 113 (65%) | 39 (74%) |
| 1 | 60 (35%) | 14 (26%) |
| 0=Cerebral Isch and others, 1=Haemorrhagic | ||
| CI,Others | 132 (76%) | 17 (32%) |
| Haemorrhagic | 41 (24%) | 36 (68%) |
| hpt2cat | ||
| 0 | 60 (35%) | 18 (34%) |
| 1 | 113 (65%) | 35 (66%) |
| dyslipid2cat | ||
| 0 | 147 (85%) | 52 (98%) |
| 1 | 26 (15%) | 1 (1.9%) |
| 1 Mean (SD); n (%) | ||
KM1 <- survfit(Surv(time = time, status3b == 'dead') ~ 1,
type = "kaplan-meier", data = stroke1)
summary(KM1)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ 1,
## data = stroke1, type = "kaplan-meier")
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 226 3 0.987 0.00761 0.9719 1.000
## 1 221 10 0.942 0.01559 0.9120 0.973
## 2 197 4 0.923 0.01797 0.8884 0.959
## 3 169 4 0.901 0.02060 0.8616 0.942
## 4 133 4 0.874 0.02403 0.8282 0.922
## 5 92 5 0.827 0.03071 0.7685 0.889
## 6 67 3 0.789 0.03601 0.7220 0.863
## 7 58 4 0.735 0.04259 0.6561 0.823
## 9 44 2 0.702 0.04675 0.6157 0.800
## 10 38 1 0.683 0.04903 0.5935 0.786
## 12 34 4 0.603 0.05742 0.5001 0.727
## 14 25 2 0.555 0.06213 0.4452 0.691
## 18 20 1 0.527 0.06492 0.4138 0.671
## 22 16 1 0.494 0.06870 0.3761 0.649
## 25 10 2 0.395 0.08321 0.2615 0.597
## 28 6 1 0.329 0.09178 0.1907 0.569
## 29 5 1 0.263 0.09413 0.1308 0.531
## 41 3 1 0.176 0.09528 0.0606 0.509
The survival probabilites can be presented in plot .
ggsurvplot(KM1, data = stroke1, risk.table = TRUE, linetype = c(1,2), pval = TRUE)
## Warning in .pvalue(fit, data = data, method = method, pval = pval, pval.coord = pval.coord, : There are no survival curves to be compared.
## This is a null model.
KM1.sex <- survfit(Surv(time = time, status3b == 'dead') ~ sex,
type = "kaplan-meier", data = stroke1)
summary(KM1.sex)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ sex,
## data = stroke1, type = "kaplan-meier")
##
## sex=male
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 95 4 0.958 0.0206 0.918 0.999
## 3 71 2 0.931 0.0275 0.879 0.986
## 5 32 1 0.902 0.0391 0.828 0.982
## 6 27 2 0.835 0.0581 0.729 0.957
## 7 21 1 0.795 0.0676 0.673 0.939
## 9 15 1 0.742 0.0813 0.599 0.920
## 10 13 1 0.685 0.0929 0.525 0.894
## 12 10 1 0.617 0.1059 0.440 0.863
## 14 7 1 0.529 0.1220 0.336 0.831
## 22 4 1 0.396 0.1465 0.192 0.818
##
## sex=female
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 129 3 0.9767 0.0133 0.9511 1.000
## 1 126 6 0.9302 0.0224 0.8873 0.975
## 2 112 4 0.8970 0.0271 0.8455 0.952
## 3 98 2 0.8787 0.0295 0.8228 0.938
## 4 80 4 0.8348 0.0352 0.7685 0.907
## 5 60 4 0.7791 0.0425 0.7001 0.867
## 6 40 1 0.7596 0.0457 0.6752 0.855
## 7 37 3 0.6980 0.0541 0.5997 0.812
## 9 29 1 0.6740 0.0573 0.5705 0.796
## 12 24 3 0.5897 0.0677 0.4709 0.739
## 14 18 1 0.5570 0.0714 0.4332 0.716
## 18 15 1 0.5198 0.0757 0.3907 0.692
## 25 8 2 0.3899 0.0978 0.2385 0.637
## 28 4 1 0.2924 0.1118 0.1382 0.619
## 29 3 1 0.1949 0.1090 0.0651 0.583
## 41 2 1 0.0975 0.0879 0.0167 0.571
Among males, the survival probability falls below 0.5 between days 12 and 14, indicating an estimated median survival time of approximately 13 to 14 days. In contrast, females show a more gradual decline in survival, with the probability dropping below 0.5 between days 18 and 25, suggesting a longer median survival time of around 20 to 22 days. This indicates that female patients had better overall survival compared to males during the follow-up period.
Plot : sex
ggsurvplot(KM1.sex, data = stroke1, risk.table = TRUE,
linetype = c(1,2), pval = TRUE)
KM1.dm <- survfit(Surv(time = time, status3b == 'dead') ~ dm2cat,
type = "kaplan-meier", data = stroke1)
summary(KM1.dm)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ dm2cat,
## data = stroke1, type = "kaplan-meier")
##
## dm2cat=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 152 2 0.987 0.00924 0.969 1.000
## 1 148 9 0.927 0.02124 0.886 0.969
## 2 128 4 0.898 0.02503 0.850 0.948
## 3 105 2 0.881 0.02732 0.829 0.936
## 4 85 2 0.860 0.03035 0.803 0.922
## 5 56 5 0.783 0.04287 0.704 0.872
## 6 42 3 0.727 0.05054 0.635 0.833
## 7 36 2 0.687 0.05522 0.587 0.804
## 9 29 1 0.663 0.05817 0.558 0.788
## 10 25 1 0.637 0.06160 0.527 0.770
## 12 21 4 0.515 0.07391 0.389 0.683
## 18 14 1 0.479 0.07726 0.349 0.657
## 22 10 1 0.431 0.08304 0.295 0.629
## 25 5 1 0.345 0.10174 0.193 0.615
## 29 4 1 0.258 0.10672 0.115 0.581
##
## dm2cat=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 74 1 0.986 0.0134 0.9605 1.000
## 1 73 1 0.973 0.0189 0.9367 1.000
## 3 64 2 0.943 0.0280 0.8893 0.999
## 4 48 2 0.903 0.0382 0.8315 0.981
## 7 22 2 0.821 0.0653 0.7026 0.960
## 9 15 1 0.766 0.0807 0.6235 0.942
## 14 9 2 0.596 0.1234 0.3973 0.894
## 25 5 1 0.477 0.1453 0.2625 0.867
## 28 2 1 0.238 0.1836 0.0527 1.000
## 41 1 1 0.000 NaN NA NA
ggsurvplot(KM1.dm, data = stroke1, risk.table = TRUE,
linetype = c(1,2), pval = TRUE)
KM1.hpt <- survfit(Surv(time = time, status3b == 'dead') ~ hpt2cat,
type = "kaplan-meier", data = stroke1)
summary(KM1.hpt)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ hpt2cat,
## data = stroke1, type = "kaplan-meier")
##
## hpt2cat=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 78 1 0.987 0.0127 0.9625 1.000
## 1 76 3 0.948 0.0252 0.9001 0.999
## 2 65 2 0.919 0.0318 0.8588 0.983
## 3 53 1 0.902 0.0356 0.8346 0.974
## 4 42 1 0.880 0.0407 0.8039 0.964
## 5 25 3 0.775 0.0675 0.6530 0.919
## 6 20 1 0.736 0.0744 0.6036 0.897
## 7 19 2 0.658 0.0844 0.5122 0.846
## 9 14 1 0.611 0.0905 0.4574 0.817
## 12 10 1 0.550 0.1000 0.3854 0.786
## 18 8 1 0.481 0.1086 0.3094 0.749
## 41 2 1 0.241 0.1787 0.0562 1.000
##
## hpt2cat=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 148 2 0.986 0.00949 0.9681 1.000
## 1 145 7 0.939 0.01975 0.9009 0.978
## 2 132 2 0.925 0.02186 0.8828 0.968
## 3 116 3 0.901 0.02528 0.8525 0.952
## 4 91 3 0.871 0.02970 0.8147 0.931
## 5 67 2 0.845 0.03403 0.7809 0.914
## 6 47 2 0.809 0.04099 0.7326 0.894
## 7 39 2 0.768 0.04826 0.6786 0.868
## 9 30 1 0.742 0.05300 0.6451 0.854
## 10 27 1 0.715 0.05773 0.6099 0.837
## 12 24 3 0.625 0.06984 0.5023 0.778
## 14 16 2 0.547 0.08004 0.4107 0.729
## 22 12 1 0.501 0.08537 0.3592 0.700
## 25 7 2 0.358 0.10512 0.2015 0.637
## 28 3 1 0.239 0.12006 0.0891 0.640
## 29 2 1 0.119 0.10359 0.0218 0.654
ggsurvplot(KM1.hpt, data = stroke1, risk.table = TRUE,
linetype = c(1,2), pval = TRUE)
KM1.dyslipid <- survfit(Surv(time = time, status3b == 'dead') ~ dyslipid2cat,
type = "kaplan-meier", data = stroke1)
summary(KM1.dyslipid)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ dyslipid2cat,
## data = stroke1, type = "kaplan-meier")
##
## dyslipid2cat=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 199 3 0.985 0.00864 0.9681 1.000
## 1 194 9 0.939 0.01700 0.9065 0.973
## 2 176 4 0.918 0.01968 0.8801 0.957
## 3 149 4 0.893 0.02268 0.8499 0.939
## 4 119 4 0.863 0.02643 0.8129 0.917
## 5 86 5 0.813 0.03308 0.7507 0.881
## 6 64 3 0.775 0.03815 0.7036 0.853
## 7 56 4 0.720 0.04434 0.6377 0.812
## 9 43 2 0.686 0.04818 0.5979 0.787
## 10 37 1 0.668 0.05032 0.5759 0.774
## 12 33 4 0.587 0.05826 0.4829 0.713
## 14 24 2 0.538 0.06283 0.4277 0.676
## 18 20 1 0.511 0.06519 0.3978 0.656
## 22 16 1 0.479 0.06849 0.3619 0.634
## 25 10 2 0.383 0.08168 0.2523 0.582
## 28 6 1 0.319 0.08962 0.1842 0.553
## 29 5 1 0.255 0.09167 0.1264 0.516
## 41 3 1 0.170 0.09256 0.0587 0.494
##
## dyslipid2cat=1
## time n.risk n.event survival std.err lower 95% CI
## 1.0000 27.0000 1.0000 0.9630 0.0363 0.8943
## upper 95% CI
## 1.0000
ggsurvplot(KM1.dyslipid, data = stroke1, risk.table = TRUE,
linetype = c(1,2), pval = TRUE)
KM1.icd10 <- survfit(Surv(time = time, status3b == 'dead') ~ icd10cat2,
type = "kaplan-meier", data = stroke1)
summary(KM1.icd10)
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ icd10cat2,
## data = stroke1, type = "kaplan-meier")
##
## icd10cat2=CI,Others
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 147 3 0.980 0.0117 0.957 1.000
## 2 132 3 0.957 0.0171 0.924 0.991
## 4 78 2 0.933 0.0239 0.887 0.981
## 5 43 1 0.911 0.0317 0.851 0.975
## 6 27 1 0.877 0.0450 0.793 0.970
## 7 22 2 0.798 0.0676 0.676 0.942
## 12 9 2 0.620 0.1224 0.421 0.913
## 14 5 1 0.496 0.1480 0.277 0.890
## 28 3 1 0.331 0.1673 0.123 0.891
## 41 1 1 0.000 NaN NA NA
##
## icd10cat2=Haemorrhagic
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 77 3 0.961 0.0221 0.9188 1.000
## 1 74 7 0.870 0.0383 0.7982 0.949
## 2 65 1 0.857 0.0400 0.7818 0.939
## 3 61 4 0.801 0.0462 0.7150 0.896
## 4 55 2 0.771 0.0489 0.6814 0.873
## 5 49 4 0.708 0.0541 0.6100 0.823
## 6 40 2 0.673 0.0569 0.5703 0.794
## 7 36 2 0.636 0.0596 0.5290 0.764
## 9 32 2 0.596 0.0621 0.4858 0.731
## 10 28 1 0.575 0.0634 0.4629 0.713
## 12 25 2 0.529 0.0662 0.4137 0.676
## 14 20 1 0.502 0.0679 0.3853 0.655
## 18 16 1 0.471 0.0706 0.3510 0.632
## 22 12 1 0.432 0.0748 0.3073 0.606
## 25 7 2 0.308 0.0910 0.1728 0.550
## 29 3 1 0.206 0.1036 0.0766 0.552
ggsurvplot(KM1.icd10, data = stroke1, risk.table = TRUE,
linetype = c(1,2), pval = TRUE)
estimation of the survival probabality at that specific time of follow-up:
stroke1 %>% group_by(status3b) %>%
summarize(min.dur = min(time), max.dur = max(time))
## # A tibble: 2 × 3
## status3b min.dur max.dur
## <fct> <dbl> <dbl>
## 1 alive 0 61
## 2 dead 0 41
summary(KM1, times = c(20, 40, 60))
## Call: survfit(formula = Surv(time = time, status3b == "dead") ~ 1,
## data = stroke1, type = "kaplan-meier")
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 20 18 47 0.527 0.0649 0.4138 0.671
## 40 3 5 0.263 0.0941 0.1308 0.531
## 60 1 1 0.176 0.0953 0.0606 0.509
Comparing the survival estimates between levels of a group (categorical) variable Log Rank Test
The null hypothesis : survival estimates between levels or groups are not different.
logrank.sex <- survdiff(Surv(time = time, status3b == 'dead') ~ sex,
data = stroke1, rho = 0)
logrank.sex
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ sex,
## data = stroke1, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=male 97 15 19.7 1.126 1.89
## sex=female 129 38 33.3 0.666 1.89
##
## Chisq= 1.9 on 1 degrees of freedom, p= 0.2
The survival estimates between the gender group are not different( p value : 0.2)
logrank.dm <- survdiff(Surv(time = time, status3b == 'dead') ~ dm2cat,
data = stroke1, rho = 0)
logrank.dm
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ dm2cat,
## data = stroke1, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## dm2cat=0 152 39 33.9 0.752 2.19
## dm2cat=1 74 14 19.1 1.340 2.19
##
## Chisq= 2.2 on 1 degrees of freedom, p= 0.1
The survival estimates between the DM status are not different( p value : 0.1)
logrank.hpt <- survdiff(Surv(time = time, status3b == 'dead') ~ hpt2cat,
data = stroke1, rho = 0)
logrank.hpt
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ hpt2cat,
## data = stroke1, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## hpt2cat=0 78 18 17.7 0.00458 0.00729
## hpt2cat=1 148 35 35.3 0.00230 0.00729
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
The survival estimates between the hpt status are not different( p value : 0.9)
logrank.dyslipid <- survdiff(Surv(time = time, status3b == 'dead') ~ hpt2cat,
data = stroke1, rho = 0)
logrank.dyslipid
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ hpt2cat,
## data = stroke1, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## hpt2cat=0 78 18 17.7 0.00458 0.00729
## hpt2cat=1 148 35 35.3 0.00230 0.00729
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
The survival estimates between the dyslipidemia status are not different( p value : 0.9)
logrank.icd10cat <- survdiff(Surv(time = time, status3b == 'dead') ~ icd10cat2,
data = stroke1, rho = 0)
logrank.icd10cat
## Call:
## survdiff(formula = Surv(time = time, status3b == "dead") ~ icd10cat2,
## data = stroke1, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## icd10cat2=CI,Others 149 17 25.8 3.02 6.87
## icd10cat2=Haemorrhagic 77 36 27.2 2.87 6.87
##
## Chisq= 6.9 on 1 degrees of freedom, p= 0.009
The Survival estimates between stroke type ( CI/other and haemorhagic ) are different at the level of 5% significance (p-value = 0.009).
Univariable Simple Cox PH Regression
cox.gcs <- coxph(Surv(time = time, event = status3b == 'dead') ~ gcs,
data = stroke1)
summary(cox.gcs)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## gcs, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gcs -0.18784 0.82875 0.03227 -5.82 5.89e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## gcs 0.8288 1.207 0.7779 0.8829
##
## Concordance= 0.783 (se = 0.035 )
## Likelihood ratio test= 34.14 on 1 df, p=5e-09
## Wald test = 33.87 on 1 df, p=6e-09
## Score (logrank) test = 39.48 on 1 df, p=3e-10
The simple cox PH model with covariate gcs shows that with each one unit increase in gcs, the crude log hazard for death changes by factor of -0.192. The pvalue is significant. exponentiating the log HR, the simple cox shows that with increase one unot of gcs, the crude risk for death decreases for about 17% and the decrease are between 95% CI ( 0.774, 0.8798).
cox.age <- coxph(Surv(time = time, event = status3b == 'dead') ~ age2,
data = stroke1)
summary(cox.age)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## age2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age2 0.02441 1.02471 0.01123 2.173 0.0298 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age2 1.025 0.9759 1.002 1.048
##
## Concordance= 0.61 (se = 0.052 )
## Likelihood ratio test= 4.82 on 1 df, p=0.03
## Wald test = 4.72 on 1 df, p=0.03
## Score (logrank) test = 4.69 on 1 df, p=0.03
The simple cox PH model with covariate age shows that with each one unit increase in age, the crude log hazard for death changes by factor of 1.025.
cox.sex <- coxph(Surv(time = time, event = status3b == 'dead') ~ sex,
data = stroke1)
summary(cox.sex)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## sex, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexfemale 0.4179 1.5187 0.3071 1.361 0.174
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexfemale 1.519 0.6584 0.832 2.772
##
## Concordance= 0.572 (se = 0.038 )
## Likelihood ratio test= 1.95 on 1 df, p=0.2
## Wald test = 1.85 on 1 df, p=0.2
## Score (logrank) test = 1.88 on 1 df, p=0.2
in simple cox PH regression , the covariote sex is not significant ( p-value 0.174)
cox.icd10cat2 <- coxph(Surv(time = time, event = status3b == 'dead') ~ icd10cat2,
data = stroke1)
summary(cox.icd10cat2)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## icd10cat2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## icd10cat2Haemorrhagic 0.7922 2.2082 0.3090 2.564 0.0104 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## icd10cat2Haemorrhagic 2.208 0.4528 1.205 4.047
##
## Concordance= 0.65 (se = 0.041 )
## Likelihood ratio test= 6.95 on 1 df, p=0.008
## Wald test = 6.57 on 1 df, p=0.01
## Score (logrank) test = 6.83 on 1 df, p=0.009
The simple Cox Ph model woth covariate stroke type shows that patients with haemorhagic stroke has the crude log hazard for death 2.208 times compared to patients with inchaemic type/others ( pvalue = 0.014) by exponentiate the hazard log hazard, we will get hazard ratio.
tidy(cox.icd10cat2,
exponentiate = TRUE,
conf.int = TRUE)
## # A tibble: 1 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 icd10cat2Haemorrhagic 2.21 0.309 2.56 0.0104 1.21 4.05
Patients with haemorhagic stroke shas Hazard ration of 2.208 compared to patient with inschameic stroke ( pvalue = 0.0104 and 95%CI 1.205,4.047)
cox.hpt2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ hpt2cat,
data = stroke1)
summary(cox.hpt2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## hpt2cat, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## hpt2cat1 -0.02418 0.97611 0.29336 -0.082 0.934
##
## exp(coef) exp(-coef) lower .95 upper .95
## hpt2cat1 0.9761 1.024 0.5493 1.735
##
## Concordance= 0.511 (se = 0.041 )
## Likelihood ratio test= 0.01 on 1 df, p=0.9
## Wald test = 0.01 on 1 df, p=0.9
## Score (logrank) test = 0.01 on 1 df, p=0.9
cox.dyslipid2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ dyslipid2cat,
data = stroke1)
summary(cox.dyslipid2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## dyslipid2cat, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dyslipid2cat1 -1.4174 0.2424 1.0137 -1.398 0.162
##
## exp(coef) exp(-coef) lower .95 upper .95
## dyslipid2cat1 0.2424 4.126 0.03323 1.767
##
## Concordance= 0.534 (se = 0.02 )
## Likelihood ratio test= 3.17 on 1 df, p=0.08
## Wald test = 1.95 on 1 df, p=0.2
## Score (logrank) test = 2.3 on 1 df, p=0.1
cox.dm2cat <- coxph(Surv(time = time, event = status3b == 'dead') ~ dm2cat,
data = stroke1)
summary(cox.dm2cat)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## dm2cat, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## dm2cat1 -0.4618 0.6302 0.3127 -1.477 0.14
##
## exp(coef) exp(-coef) lower .95 upper .95
## dm2cat1 0.6302 1.587 0.3414 1.163
##
## Concordance= 0.574 (se = 0.035 )
## Likelihood ratio test= 2.33 on 1 df, p=0.1
## Wald test = 2.18 on 1 df, p=0.1
## Score (logrank) test = 2.22 on 1 df, p=0.1
for hypertension status, DM status , and dyslipidemia status , the covariates are not significant.
cox.mv <- coxph(Surv(time = time, event = status3b == 'dead') ~ gcs +
age2 + icd10cat2, data = stroke1)
summary(cox.mv)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## gcs + age2 + icd10cat2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gcs -0.18328 0.83253 0.03364 -5.449 5.07e-08 ***
## age2 0.03169 1.03220 0.01120 2.829 0.00468 **
## icd10cat2Haemorrhagic 0.47526 1.60843 0.31408 1.513 0.13024
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## gcs 0.8325 1.2012 0.7794 0.8893
## age2 1.0322 0.9688 1.0098 1.0551
## icd10cat2Haemorrhagic 1.6084 0.6217 0.8691 2.9768
##
## Concordance= 0.817 (se = 0.03 )
## Likelihood ratio test= 43.55 on 3 df, p=2e-09
## Wald test = 41.91 on 3 df, p=4e-09
## Score (logrank) test = 48.65 on 3 df, p=2e-10
Numerical and Numerical
cox.gcs.age <- coxph(Surv(time = time, event = status3b == 'dead') ~ gcs +
age2 + icd10cat2 + gcs:age2, data = stroke1)
summary(cox.gcs.age)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## gcs + age2 + icd10cat2 + gcs:age2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gcs -0.366160 0.693392 0.180317 -2.031 0.0423 *
## age2 0.006961 1.006986 0.026258 0.265 0.7909
## icd10cat2Haemorrhagic 0.491785 1.635233 0.320404 1.535 0.1248
## gcs:age2 0.002946 1.002950 0.002839 1.038 0.2995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## gcs 0.6934 1.4422 0.4870 0.9873
## age2 1.0070 0.9931 0.9565 1.0602
## icd10cat2Haemorrhagic 1.6352 0.6115 0.8727 3.0641
## gcs:age2 1.0030 0.9971 0.9974 1.0085
##
## Concordance= 0.814 (se = 0.03 )
## Likelihood ratio test= 44.62 on 4 df, p=5e-09
## Wald test = 38.96 on 4 df, p=7e-08
## Score (logrank) test = 49.75 on 4 df, p=4e-10
Numerical and Categorical
cox.gcs.icd10cat2 <- coxph(Surv(time = time, event = status3b == 'dead') ~ gcs +
age2 + icd10cat2 + gcs:icd10cat2, data = stroke1)
summary(cox.gcs.icd10cat2)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## gcs + age2 + icd10cat2 + gcs:icd10cat2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gcs -0.15645 0.85518 0.05424 -2.884 0.00392 **
## age2 0.03216 1.03268 0.01118 2.876 0.00403 **
## icd10cat2Haemorrhagic 0.88815 2.43064 0.72681 1.222 0.22172
## gcs:icd10cat2Haemorrhagic -0.04503 0.95597 0.07018 -0.642 0.52112
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## gcs 0.8552 1.1694 0.7689 0.9511
## age2 1.0327 0.9683 1.0103 1.0556
## icd10cat2Haemorrhagic 2.4306 0.4114 0.5849 10.1015
## gcs:icd10cat2Haemorrhagic 0.9560 1.0461 0.8331 1.0969
##
## Concordance= 0.819 (se = 0.029 )
## Likelihood ratio test= 43.97 on 4 df, p=7e-09
## Wald test = 43.19 on 4 df, p=9e-09
## Score (logrank) test = 50.58 on 4 df, p=3e-10
anova(cox.mv, cox.gcs.age)
## Analysis of Deviance Table
## Cox model: response is Surv(time = time, event = status3b == "dead")
## Model 1: ~ gcs + age2 + icd10cat2
## Model 2: ~ gcs + age2 + icd10cat2 + gcs:age2
## loglik Chisq Df Pr(>|Chi|)
## 1 -206.82
## 2 -206.29 1.0698 1 0.301
anova(cox.mv, cox.gcs.icd10cat2)
## Analysis of Deviance Table
## Cox model: response is Surv(time = time, event = status3b == "dead")
## Model 1: ~ gcs + age2 + icd10cat2
## Model 2: ~ gcs + age2 + icd10cat2 + gcs:icd10cat2
## loglik Chisq Df Pr(>|Chi|)
## 1 -206.82
## 2 -206.61 0.4189 1 0.5175
In model comparison , we compare between model with interaction term. We observe the same result ( p value >0.05). We decide not to add interaction term in the model (parsimonous model)
Age2 and gcs
ggcoxfunctional(Surv(time, status3b == "dead") ~ age2 + gcs, data = stroke1)
## Warning: arguments formula is deprecated; will be removed in the next version;
## please use fit instead.
Linearity assummed
The main assumption in Cox PH regression is that the estimated hazard is proportional across the follow-up time.
prop.h.km <- cox.zph(cox.mv, transform = 'km', global = TRUE)
prop.h.km
## chisq df p
## gcs 0.0337 1 0.854
## age2 0.8474 1 0.357
## icd10cat2 3.2547 1 0.071
## GLOBAL 4.2053 3 0.240
plot(prop.h.km)
prop.h.rank <- cox.zph(cox.mv, transform = 'rank')
prop.h.rank
## chisq df p
## gcs 2.345 1 0.126
## age2 0.435 1 0.510
## icd10cat2 2.927 1 0.087
## GLOBAL 5.132 3 0.162
plot(prop.h.rank)
final model : cox.mv
We can use residuals to assess for model fitness. They are useful to check for overall model fitness or for individual subjects fitness. The residuals can indicate the presence of outliers or influential subjects in our model.
residuals() can be calculated to produce martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model.
1.1 Score Residuals
score.cox <- resid(cox.mv, type = "score")
head(score.cox)
## gcs age2 icd10cat2Haemorrhagic
## 1 5.7824817 6.3640609 -0.58119140
## 2 -0.2464467 0.0410133 0.02432698
## 3 -1.9912752 -0.4886707 -0.07679850
## 4 -2.4152637 -4.5979977 -0.19893897
## 5 -0.7692424 -0.3278447 -0.04856296
## 6 -0.8077002 8.5290419 0.16910526
1.2 Martingale residuals
marti.cox <- resid(cox.mv, type = "martingale")
head(marti.cox)
## 1 2 3 4 5 6
## 0.94290458 -0.04140021 -0.28417583 -0.77708991 -0.12505775 0.47075406
1.3 Schoenfeld residuals
schoen.cox <- resid(cox.mv, type = "schoenfeld")
head(schoen.cox)
## gcs age2 icd10cat2Haemorrhagic
## 0 -4.591998 -8.317883 0.3859729
## 0 2.408002 -11.317883 0.3859729
## 0 -5.591998 -18.317883 0.3859729
## 1 2.062987 2.147824 0.4120769
## 1 -5.937013 -3.852176 -0.5879231
## 1 -5.937013 16.147824 0.4120769
1.4 Scaled Schoenfeld residuals
sschoen.cox <- resid(cox.mv, type = "scaledsch")
head(sschoen.cox)
## gcs age2 icd10cat2Haemorrhagic
## 0 -0.397092922 -0.0079317873 1.710203
## 0 0.027059479 -0.0380863223 2.536873
## 0 -0.442499182 -0.0730107090 1.348059
## 1 -0.009896065 0.0526213599 2.944268
## 1 -0.609036105 0.0009486449 -3.449363
## 1 -0.510036647 0.1574209009 2.246893
1.5 dfbeta
dfbeta.cox <- resid(cox.mv, type = "dfbeta")
head(dfbeta.cox)
## [,1] [,2] [,3]
## 1 0.0049629628 3.834251e-04 -0.040543099
## 2 -0.0002211700 2.266179e-05 0.001822086
## 3 -0.0024253417 -4.054244e-05 -0.012605425
## 4 -0.0030874383 -5.987001e-04 -0.027493473
## 5 -0.0009787822 -4.146322e-05 -0.006794891
## 6 -0.0007394547 1.167581e-03 0.018494869
plot(stroke1$gcs, score.cox[,2], ylab="Score residuals")
plot(stroke1$age2, score.cox[,1], ylab="Score residuals")
Plot to identify the outliers using martingale residuals
plot(stroke1$age2, marti.cox, ylab = "Martingale residuals for age")
plot(marti.cox, type = 'h', main = "Martingale residuals", ylab = "dfbetas")
Or , we use the augment() function to do similar tasks as above. The resulting datasets consists of - the fitted variable
the std error of the fitted variable
the residuals
pred.cox.mv <- augment(cox.mv, data = stroke1)
pred.cox.mv
## # A tibble: 226 × 12
## time status3b age2 gcs sex dm2cat icd10cat2 hpt2cat dyslipid2cat
## <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <fct> <fct>
## 1 4 dead 69 15 female 1 CI,Others 1 0
## 2 3 alive 64 15 male 0 CI,Others 1 0
## 3 16 alive 63 15 female 1 Haemorrhagic 1 1
## 4 23 alive 67 11 female 0 Haemorrhagic 1 0
## 5 5 alive 66 15 male 0 Haemorrhagic 1 0
## 6 4 dead 78 7 female 0 Haemorrhagic 0 0
## 7 22 alive 55 5 female 1 Haemorrhagic 1 0
## 8 14 dead 65 13 female 1 CI,Others 1 0
## 9 4 alive 67 15 male 0 CI,Others 1 0
## 10 2 alive 56 15 male 0 CI,Others 0 0
## # ℹ 216 more rows
## # ℹ 3 more variables: .fitted <dbl>, .se.fit <dbl>, .resid <dbl>
From the Cox PH , we can predict 1. The linear predictor 2. The risk 3. The expected number of events given the covariates and follow up time
We make a new data and name them as newdata using expand.grid() function:
our model
summary(cox.mv)
## Call:
## coxph(formula = Surv(time = time, event = status3b == "dead") ~
## gcs + age2 + icd10cat2, data = stroke1)
##
## n= 226, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## gcs -0.18328 0.83253 0.03364 -5.449 5.07e-08 ***
## age2 0.03169 1.03220 0.01120 2.829 0.00468 **
## icd10cat2Haemorrhagic 0.47526 1.60843 0.31408 1.513 0.13024
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## gcs 0.8325 1.2012 0.7794 0.8893
## age2 1.0322 0.9688 1.0098 1.0551
## icd10cat2Haemorrhagic 1.6084 0.6217 0.8691 2.9768
##
## Concordance= 0.817 (se = 0.03 )
## Likelihood ratio test= 43.55 on 3 df, p=2e-09
## Wald test = 41.91 on 3 df, p=4e-09
## Score (logrank) test = 48.65 on 3 df, p=2e-10
tidy(cox.mv)
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 gcs -0.183 0.0336 -5.45 0.0000000507
## 2 age2 0.0317 0.0112 2.83 0.00468
## 3 icd10cat2Haemorrhagic 0.475 0.314 1.51 0.130
stroke1 %>% select(gcs, age2, icd10cat2) %>% summary()
## gcs age2 icd10cat2
## Min. : 3.00 Min. :20.00 CI,Others :149
## 1st Qu.:10.00 1st Qu.:52.00 Haemorrhagic: 77
## Median :15.00 Median :61.00
## Mean :12.44 Mean :60.82
## 3rd Qu.:15.00 3rd Qu.:68.75
## Max. :15.00 Max. :94.00
new_data <- expand.grid(gcs = c(5, 10, 12),
age2 = c(40, 50, 60),
icd10cat2 = c('CI,Others', 'Haemorrhagic'))
new_data
## gcs age2 icd10cat2
## 1 5 40 CI,Others
## 2 10 40 CI,Others
## 3 12 40 CI,Others
## 4 5 50 CI,Others
## 5 10 50 CI,Others
## 6 12 50 CI,Others
## 7 5 60 CI,Others
## 8 10 60 CI,Others
## 9 12 60 CI,Others
## 10 5 40 Haemorrhagic
## 11 10 40 Haemorrhagic
## 12 12 40 Haemorrhagic
## 13 5 50 Haemorrhagic
## 14 10 50 Haemorrhagic
## 15 12 50 Haemorrhagic
## 16 5 60 Haemorrhagic
## 17 10 60 Haemorrhagic
## 18 12 60 Haemorrhagic
model : cox.mv
predict(cox.mv, newdata = new_data, type = 'lp')
## 1 2 3 4 5 6
## 0.70287990 -0.21352530 -0.58008739 1.01979686 0.10339165 -0.26317043
## 7 8 9 10 11 12
## 1.33671381 0.42030861 0.05374653 1.17813573 0.26173053 -0.10483155
## 13 14 15 16 17 18
## 1.49505269 0.57864749 0.21208541 1.81196965 0.89556445 0.52900236
augment(cox.mv, newdata = new_data)
## # A tibble: 18 × 5
## gcs age2 icd10cat2 .fitted .se.fit
## <dbl> <dbl> <fct> <dbl> <dbl>
## 1 5 40 CI,Others 0.703 0.329
## 2 10 40 CI,Others -0.214 0.242
## 3 12 40 CI,Others -0.580 0.233
## 4 5 50 CI,Others 1.02 0.270
## 5 10 50 CI,Others 0.103 0.141
## 6 12 50 CI,Others -0.263 0.121
## 7 5 60 CI,Others 1.34 0.250
## 8 10 60 CI,Others 0.420 0.0818
## 9 12 60 CI,Others 0.0537 0.0167
## 10 5 40 Haemorrhagic 1.18 0.391
## 11 10 40 Haemorrhagic 0.262 0.356
## 12 12 40 Haemorrhagic -0.105 0.364
## 13 5 50 Haemorrhagic 1.50 0.355
## 14 10 50 Haemorrhagic 0.579 0.312
## 15 12 50 Haemorrhagic 0.212 0.319
## 16 5 60 Haemorrhagic 1.81 0.353
## 17 10 60 Haemorrhagic 0.896 0.305
## 18 12 60 Haemorrhagic 0.529 0.310
predict(cox.mv, newdata = new_data, type = 'risk')
## 1 2 3 4 5 6 7 8
## 2.0195605 0.8077317 0.5598494 2.7726315 1.1089256 0.7686109 3.8065140 1.5224313
## 9 10 11 12 13 14 15 16
## 1.0552171 3.2483128 1.2991764 0.9004762 4.4595715 1.7836244 1.2362535 6.1224947
## 17 18
## 2.4487176 1.6972382
new_data2 <- expand.grid(status3b = 'dead', time = c(20, 40, 50))
new_data2
## status3b time
## 1 dead 20
## 2 dead 40
## 3 dead 50
Combine new_data and new_data2
new_data3 <- data.frame(new_data, new_data2)
head(new_data3)
## gcs age2 icd10cat2 status3b time
## 1 5 40 CI,Others dead 20
## 2 10 40 CI,Others dead 40
## 3 12 40 CI,Others dead 50
## 4 5 50 CI,Others dead 20
## 5 10 50 CI,Others dead 40
## 6 12 50 CI,Others dead 50
the predicted number of events are
pred.exp <- predict(cox.mv, newdata = new_data3, type = 'expected')
cbind(new_data3, pred.exp)
## gcs age2 icd10cat2 status3b time pred.exp
## 1 5 40 CI,Others dead 20 0.5690970
## 2 10 40 CI,Others dead 40 0.4934196
## 3 12 40 CI,Others dead 50 0.5291371
## 4 5 50 CI,Others dead 20 0.7813067
## 5 10 50 CI,Others dead 40 0.6774101
## 6 12 50 CI,Others dead 50 0.7264463
## 7 5 60 CI,Others dead 20 1.0726470
## 8 10 60 CI,Others dead 40 0.9300086
## 9 12 60 CI,Others dead 50 0.9973298
## 10 5 40 Haemorrhagic dead 20 0.9153501
## 11 10 40 Haemorrhagic dead 40 0.7936287
## 12 12 40 Haemorrhagic dead 50 0.8510777
## 13 5 50 Haemorrhagic dead 20 1.2566737
## 14 10 50 Haemorrhagic dead 40 1.0895638
## 15 12 50 Haemorrhagic dead 50 1.1684348
## 16 5 60 Haemorrhagic dead 20 1.7252730
## 17 10 60 Haemorrhagic dead 40 1.4958497
## 18 12 60 Haemorrhagic dead 50 1.6041308
Conclusion final table for multivariable cox PH
tbl_regression(cox.mv, exponentiate = TRUE)
| Characteristic | HR1 | 95% CI1 | p-value |
|---|---|---|---|
| earliest Glasgow Coma Scale | 0.83 | 0.78, 0.89 | <0.001 |
| age in years | 1.03 | 1.01, 1.06 | 0.005 |
| 0=Cerebral Isch and others, 1=Haemorrhagic | |||
| CI,Others | — | — | |
| Haemorrhagic | 1.61 | 0.87, 2.98 | 0.13 |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||
Interpretation :
Every 1 unit increase in gcs, the patients has 17% lower risk of dying , adjusted to age and gcs ( pvalue<0.001, 95% CI 0.78, 0.89)
Every 1-year increase in age is expected to increase the hazard of dying by 1.03 times (pvalue 0.005, 95% CI 1.01, 1.06), adjusted to gcs and stroke type
The simple Cox Ph model with covariate stroke type shows that patients with haemorhagic stroke has the crude log hazard for death 2.208 times compared to patients with inchaemic type/others ( pvalue = 0.014) by exponentiate the hazard log hazard, we will get hazard ratio. However, when being fitted to final model, covariate stroke type loses its significance.